Goto

Collaborating Authors

 action tree


SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning

Shen, Zichao, Gao, Chen, Yuan, Jiaqi, Zhu, Tianchen, Fu, Xingcheng, Sun, Qingyun

arXiv.org Artificial Intelligence

Embodied task planning requires agents to produce executable actions in a close-loop manner within the environment. With progressively improving capabilities of LLMs in task decomposition, planning, and generalization, current embodied task planning methods adopt LLM-based architecture.However, existing LLM-based planners remain limited in three aspects, i.e., fixed planning paradigms, lack of action sequence constraints, and error-agnostic. In this work, we propose SDA-PLANNER, enabling an adaptive planning paradigm, state-dependency aware and error-aware mechanisms for comprehensive embodied task planning. Specifically, SDA-PLANNER introduces a State-Dependency Graph to explicitly model action preconditions and effects, guiding the dynamic revision. To handle execution error, it employs an error-adaptive replanning strategy consisting of Error Backtrack and Diagnosis and Adaptive Action SubTree Generation, which locally reconstructs the affected portion of the plan based on the current environment state. Experiments demonstrate that SDA-PLANNER consistently outperforms baselines in success rate and goal completion, particularly under diverse error conditions.


ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance

Li, Ying, Wei, Xiaobao, Chi, Xiaowei, Li, Yuming, Zhao, Zhongyu, Wang, Hao, Ma, Ningning, Lu, Ming, Zhang, Shanghang

arXiv.org Artificial Intelligence

While recent advancements in robotic manipulation video synthesis have shown promise, significant challenges persist in ensuring effective instruction-following and achieving high visual quality. Recent methods, like RoboDreamer, utilize linguistic decomposition to divide instructions into separate lower-level primitives, conditioning the world model on these primitives to achieve compositional instruction-following. However, these separate primitives do not consider the relationships that exist between them. Furthermore, recent methods neglect valuable visual guidance, including depth and semantic guidance, both crucial for enhancing visual quality. This paper introduces ManipDreamer, an advanced world model based on the action tree and visual guidance. To better learn the relationships between instruction primitives, we represent the instruction as the action tree and assign embeddings to tree nodes, each instruction can acquire its embeddings by navigating through the action tree. The instruction embeddings can be used to guide the world model. To enhance visual quality, we combine depth and semantic guidance by introducing a visual guidance adapter compatible with the world model. This visual adapter enhances both the temporal and physical consistency of video generation. Based on the action tree and visual guidance, ManipDreamer significantly boosts the instruction-following ability and visual quality. Comprehensive evaluations on robotic manipulation benchmarks reveal that ManipDreamer achieves large improvements in video quality metrics in both seen and unseen tasks, with PSNR improved from 19.55 to 21.05, SSIM improved from 0.7474 to 0.7982 and reduced Flow Error from 3.506 to 3.201 in unseen tasks, compared to the recent RoboDreamer model. Additionally, our method increases the success rate of robotic manipulation tasks by 2.5% in 6 RLbench tasks on average.


Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree

Khosravi, Mahsa, Jiang, Zhanhong, Waite, Joshua R, Jonesc, Sarah, Torres, Hernan, Singh, Arti, Ganapathysubramanian, Baskar, Singh, Asheesh Kumar, Sarkar, Soumik

arXiv.org Artificial Intelligence

We introduce a domain-specific reward mechanism that maximizes yield recovery while minimizing chemical usage by effectively handling noisy infection data and enforcing physical field constraints via action masking. We conduct a rigorous empirical evaluation across diverse, realistic biotic stress scenarios, capturing varying infection distributions and severity levels in row-crop fields. The proposed scheme is evaluated thoroughly, showing the framework's effectiveness and robustness. Experimental results demonstrate that our approach significantly reduces non-target spraying, chemical consumption, and operational costs compared to baseline methods. Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree Mahsa Khosravi a, Zhanhong Jiang b, Joshua R Waite b, Sarah Jones c, Hernan Torres c, Arti Singh c, Baskar Ganapathysubramanian b, Asheesh Kumar Singh c, Soumik Sarkar b a Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA b Department of Mechanical Engineering, Iowa State University, Ames, Iowa, USA c Department of Agronomy, Iowa State University, Ames, Iowa, USAAbstract This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture.


Iterative Shaping of Multi-Particle Aggregates based on Action Trees and VLM

Lee, Hoi-Yin, Zhou, Peng, Duan, Anqing, Yang, Chenguang, Navarro-Alarcon, David

arXiv.org Artificial Intelligence

In this paper, we address the problem of manipulating multi-particle aggregates using a bimanual robotic system. Our approach enables the autonomous transport of dispersed particles through a series of shaping and pushing actions using robotically-controlled tools. Achieving this advanced manipulation capability presents two key challenges: high-level task planning and trajectory execution. For task planning, we leverage Vision Language Models (VLMs) to enable primitive actions such as tool affordance grasping and non-prehensile particle pushing. For trajectory execution, we represent the evolving particle aggregate's contour using truncated Fourier series, providing efficient parametrization of its closed shape. We adaptively compute trajectory waypoints based on group cohesion and the geometric centroid of the aggregate, accounting for its spatial distribution and collective motion. Through real-world experiments, we demonstrate the effectiveness of our methodology in actively shaping and manipulating multi-particle aggregates while maintaining high system cohesion.


Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

Hu, Mengkang, Mu, Yao, Yu, Xinmiao, Ding, Mingyu, Wu, Shiguang, Shao, Wenqi, Chen, Qiguang, Wang, Bin, Qiao, Yu, Luo, Ping

arXiv.org Artificial Intelligence

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations. Recently, prompting Large Language Models (LLMs) to generate actions iteratively has become a prevalent paradigm due to its superior performance and user-friendliness. However, this paradigm is plagued by two inefficiencies: high token consumption and redundant error correction, both of which hinder its scalability for large-scale testing and applications. To address these issues, we propose Tree-Planner, which reframes task planning with LLMs into three distinct phases: plan sampling, action tree construction, and grounded deciding. Tree-Planner starts by using an LLM to sample a set of potential plans before execution, followed by the aggregation of them to form an action tree. Finally, the LLM performs a top-down decision-making process on the tree, taking into account real-time environmental information. Experiments show that Tree-Planner achieves state-of-the-art performance while maintaining high efficiency. By decomposing LLM queries into a single plan-sampling call and multiple grounded-deciding calls, a considerable part of the prompt are less likely to be repeatedly consumed. As a result, token consumption is reduced by 92.2% compared to the previously best-performing model. Additionally, by enabling backtracking on the action tree as needed, the correction process becomes more flexible, leading to a 40.5% decrease in error corrections. Project page: https://tree-planner.github.io/


PyTAG: Challenges and Opportunities for Reinforcement Learning in Tabletop Games

Balla, Martin, Long, George E. M., Jeurissen, Dominik, Goodman, James, Gaina, Raluca D., Perez-Liebana, Diego

arXiv.org Artificial Intelligence

In recent years, Game AI research has made important breakthroughs using Reinforcement Learning (RL). Despite this, RL for modern tabletop games has gained little to no attention, even when they offer a range of unique challenges compared to video games. To bridge this gap, we introduce PyTAG, a Python API for interacting with the Tabletop Games framework (TAG). TAG contains a growing set of more than 20 modern tabletop games, with a common API for AI agents. We present techniques for training RL agents in these games and introduce baseline results after training Proximal Policy Optimisation algorithms on a subset of games. Finally, we discuss the unique challenges complex modern tabletop games provide, now open to RL research through PyTAG.


Generalising Discrete Action Spaces with Conditional Action Trees

Bamford, Christopher, Ovalle, Alvaro

arXiv.org Artificial Intelligence

There are relatively few conventions followed in reinforcement learning (RL) environments to structure the action spaces. As a consequence the application of RL algorithms to tasks with large action spaces with multiple components require additional effort to adjust to different formats. In this paper we introduce {\em Conditional Action Trees} with two main objectives: (1) as a method of structuring action spaces in RL to generalise across several action space specifications, and (2) to formalise a process to significantly reduce the action space by decomposing it into multiple sub-spaces, favoring a multi-staged decision making approach. We show several proof-of-concept experiments validating our scheme, ranging from environments with basic discrete action spaces to those with large combinatorial action spaces commonly found in RTS-style games.


CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

Jernite, Yacine, Srinet, Kavya, Gray, Jonathan, Szlam, Arthur

arXiv.org Artificial Intelligence

We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which yields additional 35K human generated instructions with their semantic annotations. We report the performance of three baseline models and find that while a dataset of this size helps us train a usable instruction parser, it still poses interesting generalization challenges which we hope will help develop better and more robust models.


A Complete Epistemic Planner without the Epistemic Closed World Assumption

Wan, Hai (Sun Yat-sen University) | Yang, Rui (Sun Yat-sen University) | Fang, Liangda (Sun Yat-sen University) | Liu, Yongmei (Sun Yat-sen University) | Xu, Huada (Sun Yat-sen University)

AAAI Conferences

Planning with epistemic goals has received attention from both the dynamic logic and planning communities. In the single-agent case, under the epistemic closed-world assumption (ECWA), epistemic planning can be reduced to contingent planning. However, it is inappropriate to make the ECWA in some epistemic planning scenarios, for example, when the agent is not fully introspective, or when the agent wants to devise a generic plan that applies to a wide range of situations. In this paper, we propose a complete single-agent epistemic planner without the ECWA. We identify two normal forms of epistemic formulas: weak minimal epistemic DNF and weak minimal epistemic CNF, and present the progression and entailment algorithms based on these normal forms. We adapt the PrAO algorithm for contingent planning from the literature as the main planning algorithm and develop a complete epistemic planner called EPK. Our experimental results show that EPK can generate solutions effectively for most of the epistemic planning problems we have considered including those without the ECWA.